cranelift(aarch64): lower bare ctz/clz boolean tests via tst/cmp+Cond#13336
cranelift(aarch64): lower bare ctz/clz boolean tests via tst/cmp+Cond#13336ggreif wants to merge 2 commits into
ctz/clz boolean tests via tst/cmp+Cond#13336Conversation
Follow-up to bytecodealliance#13332. That PR added egraph rules collapsing `(eq (ctz X) 0)` / `(ne (ctz X) 0)` / clz analogues to direct LSB / sign-bit tests — but only when the comparison is mediated by an explicit `icmp`. The wasm front-end translates `wasm if (ctz X)` to `brif (ireduce.i32 (ctz.i64 X))` directly (no `icmp`), so the egraph rules don't fire on the wasm-natural shape. This commit closes the gap by specialising `is_nonzero` in the x64 backend — the helper that all `brif`/`select`/`trapif` lowerings funnel through. Four rules: `ctz`/`clz` × bare/`ireduce`-wrapped. The `ireduce` variant catches the wasm front-end's `i32.wrap_i64` over a 64-bit `ctz`/`clz` — a no-op on values in [0, bitwidth]. Test deltas (tests/disas/ctz-clz-bool-condition.wat): if_ctz_bare_i32: 5 insns -> 2 (testl $1, %edx; je) if_ctz_bare_i64: 5 insns -> 2 (testq $1, %rdx; je) if_clz_bare_i32: 7 insns -> 2 (testl %edx, %edx; jns) The icmp-mediated cases (collapsed by bytecodealliance#13332's egraph rules) are unchanged. The numeric-comparison negative test stays untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
aarch64 analogue of the x64 follow-up. Specialises `is_nonzero (ctz X)` and `is_nonzero (clz X)` (plus their `ireduce`-wrapped variants) so the wasm-natural `brif (ireduce.i32 (ctz.i64 X))` shape lowers to a single bit-test instead of `rbit; clz; cmp; b.cond`. ctz: `tst Xn, #1` + `Cond.Eq` — branches when LSB is clear. clz: `cmp Xn, #0` + `Cond.Pl` — branches when sign bit is clear. Test deltas (tests/disas/aarch64-ctz-clz-bool-condition.wat): if_ctz_bare_i32: `tst w4, #1; b.eq` if_ctz_bare_i64: `tst x4, #1; b.eq` if_clz_bare_i32: `cmp w4, #0; b.pl` Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ctz/clz boolean tests via tst/cmp+Cond
|
Ideally we would do this in the mid-end, not in every back-end individually (and then revert #13334 as well). In principle this would work by matching on the brif, not the clz -- your mid-end PR only considered simplifications of the clz itself, which is why you didn't see this option I think. In other words, it's not valid to simplify I can't seem to get the |
…keleton` The mid-end rules added in bytecodealliance#13332 hinge on an `icmp eq/ne (ctz/clz X) 0` shape — i.e. the wasm 3-op pattern `i32.ctz; i32.eqz; br_if`. Frontends that emit the 2-op form `i32.ctz; br_if` (e.g. Motoko's `moc` after its `and 1; eqz; br_if` → `ctz; br_if` byte-size peephole) feed `(brif (ctz X))` into cranelift with no `icmp` for the existing rules to match. This commit extends `simplify_skeleton` to rewrite the *condition operand* of an existing `brif` in place, without touching its opcode or successor blocks (CFG-preserving by construction). A new `SkeletonInstSimplification` variant `ReplaceBranchCond(Value)` carries the new condition; the egraph driver applies it by writing through `inst_args_mut`. Two ISLE rules in `opts/icmp.isle` rewrite `(brif (ctz X) bt be)` and `(brif (clz X) bt be)` to brifs over the equivalent bit-extract form: brif (ctz X) bt be → brif (eq (band X 1) 0) bt be brif (clz X) bt be → brif (sge X 0) bt be End-to-end lowering on the resulting brif then composes with existing backend `icmp+brif` fusion to produce: x86_64 brif (ctz X): `testl $1, %edi; je` x86_64 brif (clz X): `testl %edi, %edi; jge` aarch64 brif (ctz X): `tbz w0, #0` — single-instruction test-and-branch This subsumes the backend-side x64 rules added in bytecodealliance#13334 and the aarch64 rules in bytecodealliance#13336 (and yields tighter aarch64 code than bytecodealliance#13336 did). The driver still rejects non-`brif` branches and rejects non-`ReplaceBranchCond` simplification variants on `brif` (a `Replace inst` of a brif would risk changing successor block IDs and is left to a future, broader extension). Filetest `egraph/brif-cnt-cond.clif` covers ctz/clz over i32/i64 in the 2-op form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…keleton` (bytecodealliance#13343) * cranelift: fold `ctz`/`clz` directly into `brif` cond via `simplify_skeleton` The mid-end rules added in bytecodealliance#13332 hinge on an `icmp eq/ne (ctz/clz X) 0` shape — i.e. the wasm 3-op pattern `i32.ctz; i32.eqz; br_if`. Frontends that emit the 2-op form `i32.ctz; br_if` (e.g. Motoko's `moc` after its `and 1; eqz; br_if` → `ctz; br_if` byte-size peephole) feed `(brif (ctz X))` into cranelift with no `icmp` for the existing rules to match. This commit extends `simplify_skeleton` to rewrite the *condition operand* of an existing `brif` in place, without touching its opcode or successor blocks (CFG-preserving by construction). A new `SkeletonInstSimplification` variant `ReplaceBranchCond(Value)` carries the new condition; the egraph driver applies it by writing through `inst_args_mut`. Two ISLE rules in `opts/icmp.isle` rewrite `(brif (ctz X) bt be)` and `(brif (clz X) bt be)` to brifs over the equivalent bit-extract form: brif (ctz X) bt be → brif (eq (band X 1) 0) bt be brif (clz X) bt be → brif (sge X 0) bt be End-to-end lowering on the resulting brif then composes with existing backend `icmp+brif` fusion to produce: x86_64 brif (ctz X): `testl $1, %edi; je` x86_64 brif (clz X): `testl %edi, %edi; jge` aarch64 brif (ctz X): `tbz w0, #0` — single-instruction test-and-branch This subsumes the backend-side x64 rules added in bytecodealliance#13334 and the aarch64 rules in bytecodealliance#13336 (and yields tighter aarch64 code than bytecodealliance#13336 did). The driver still rejects non-`brif` branches and rejects non-`ReplaceBranchCond` simplification variants on `brif` (a `Replace inst` of a brif would risk changing successor block IDs and is left to a future, broader extension). Filetest `egraph/brif-cnt-cond.clif` covers ctz/clz over i32/i64 in the 2-op form. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * rustfmt: collapse `is_branch` && opcode-guard onto one line * tests/disas: re-bless ctz/clz-bool-condition for new mid-end fold The new `simplify_skeleton`-on-`brif` rule rewrites the 2-op `if (ctz/clz x)` cases that bytecodealliance#13332's commentary noted were the non-icmp-mediated holdouts. Bare-form lowering shrinks from ~9 instructions (bsf/bsr + cmov + test + jne + …) to `testl $1, %edx; je` (ctz) and `testl %edx, %edx; jge` (clz). Offsets on the subsequent non-bare functions shift down to match. --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
aarch64 analogue of #13334; egraph counterpart in #13332.
Same shape as the x64 follow-up: specialise
is_nonzero (ctz X)/is_nonzero (clz X)(and theirireduce-wrapped variants) incranelift/codegen/src/isa/aarch64/inst.isle, so the wasm-naturalbrif (ireduce.i32 (ctz.i64 X))shape lowers to a single bit-test instead ofrbit; clz; cmp; b.cond.aarch64-specific instructions used:
ctz:tst Xn, #1(logical AND with immediate, flags only) +Cond.Eq— branches when LSB is clear.clz:cmp Xn, #0+Cond.Pl— branches when sign bit (N flag) is clear, i.e. X is signed-non-negative.Test deltas (
tests/disas/aarch64-ctz-clz-bool-condition.wat, newly added):if_ctz_bare_i32rbit + clz + ...)tst w4, #1; b.eq)if_ctz_bare_i64tst x4, #1; b.eq)if_clz_bare_i32clz + ...)cmp w4, #0; b.pl)Negative test (
(ctz X) == 4) correctly untouched. Same motivation as #13334 — closes the gap for non-Rust wasm frontends like Motoko'smoc.riscv64 and s390x to follow.